Our data comes from an open dataset (Echeverría et al., n.d.) connected to a paper validating the Spanish version of the Mental Health Continuum-Short Form Questionnaire [MHC-SF; Echeverría et al. (2017)].
Code
### import data - this is just sample code, the files do not existlibrary(readxl)# for reading excel filesdf.all<-read_excel("data/data_sMHCSF_Echeverria2017.xlsx")df<-df.all### create dif variablesdif.sex<-factor(df$Sex)df$Sex<-NULL# check levels with #levels(dif.sex)### label gender variable as factor# dif.gender <- factor(dif.gender,# levels = c(1,2,3),# labels = c("Female", "Male", "Other/missing response"))### Load item information# make sure that variable names in df match with itemlabels$itemnritemlabels<-read_excel("data/itemlabels_MHC_SF.xlsx")%>%mutate(item =str_squish(item))# match df variable names to itemlabels$itemnr variablenames(df)<-itemlabels$itemnr# recode response categories to numbersdf<-df%>%mutate(across(everything(), ~car::recode(.x,"'Never'=0;'1 or 2 times a month'=1;'About 1 time a week'=2;'About 2 or 3 times a week'=3;'Almost daily'=4;'Daily'=5", as.factor =FALSE)))
Code
##### Optionally: filter participants based on missing data##### Before filtering out participants, you should check the missing data structure using RImissing() and RImissingP()# If you want to include participants with missing data, input the minimum number of items responses that a participant should have to be included in the analysis:# min.responses <- 3# # # Select the variables we will work with, and filter out respondents with a lot of missing data# df <- df %>% # select(starts_with("item"),Sex,Age) %>% # variables that start with "item", and DIF-variables Sex and Age# filter(length(itemlabels$itemnr)-rowSums(is.na(.[itemlabels$itemnr])) >= min.responses) # include only respondents with data for at least 3 items# # #---- OR just filter out all respondents with any missing data----# df <- df %>% # select(starts_with("WAAQ"),Sex,Age) %>% # #mutate(across(where(is.character), ~ as.numeric(.x))) %>% # if data is input as characters, we need to convert to numeric# na.omit()
Code
# optionally, load RISE ggplot theme and color palettes and set the theme as default.# just comment out the row below if you desire different themingsource("RISE_theme.R")
The eRm package, which uses Conditional Maximum Likelihood (CML) estimation, will be used primarily. For this analysis, the Partial Credit Model will be used.
itemnr
item
mhc1
Happy
mhc2
Interested in life
mhc3
Satisfied with your life
mhc4
That you had something important to contribute to society?
mhc5
That you belonged to a community?
mhc6
That our society is becoming a better place for people like you?
mhc7
That people are basically good?
mhc8
That the way our society works makes sense to you?
mhc9
That you liked most parts of your personality?
mhc10
Good at managing the responsibilities of your daily life?
mhc11
That you had warm and trusting relationships with others?
mhc12
That you had experiences that challenged you to grow and become a better person?
mhc13
Confident to think or express your own ideas and opinions?
mhc14
That your life has a sense of direction or meaning to it?
Review the documentation for further details, using ?LRtest in your R console panel in Rstudio. There is also a plotting function, plotGOF() that may be of interest.
itemnr
item
mhc1
Happy
mhc4
That you had something important to contribute to society?
mhc9
That you liked most parts of your personality?
mhc10
Good at managing the responsibilities of your daily life?
mhc11
That you had warm and trusting relationships with others?
mhc12
That you had experiences that challenged you to grow and become a better person?
mhc13
Confident to think or express your own ideas and opinions?
mhc14
That your life has a sense of direction or meaning to it?
Values highlighted in red are above the chosen cutoff 0.5 logits. Background color brown and blue indicate the lowest and highest values among the DIF groups.
Values highlighted in red are above the chosen cutoff 0.5 logits. Background color brown and blue indicate the lowest and highest values among the DIF groups.
library(lordif)g_dif<-lordif(as.data.frame(df), as.numeric(dif.sex), # make sure that the data is in a dataframe-object and that the DIF variable is numeric criterion =c("Chisqr"), alpha =0.01, beta.change =0.1, model ="GPCM", R2.change =0.02)g_dif_sum<-summary(g_dif)
We can review the results regarding uniform/non-uniform DIF by looking at the chi* columns. Uniform DIF is indicated by column chi12 and non-uniform DIF by chi23, while column chi13 represents “an overall test of”total DIF effect” (Choi, Gibbons, and Crane 2011).
WARNING: One or more problems were discovered while enumerating dependencies.
# /Users/magnuspjo/Documents/Hub/rasch_samc_mhcsf/analysis2.qmd --------------
Error: <text>:213:34: unexpected ')'
212: df <- df %>%
213: filter(50-rowSums(is.na(.)) >= )
^
Please see `?renv::dependencies` for more information.
Mair and Hatzinger (2007b); Mair and Hatzinger (2007a); Hatzinger and Rusch (2009); Rusch, Maier, and Hatzinger (2013); Koller, Maier, and Hatzinger (2015); Debelak and Koller (2019)
Trepte and Verbeet (2010); Strobl, Wickelmaier, and Zeileis (2011); Strobl, Kopf, and Zeileis (2015); Komboz, Zeileis, and Strobl (2018); Wickelmaier and Zeileis (2018)
Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2024. rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.
Chalmers, R. Philip. 2012. “mirt: A Multidimensional Item Response Theory Package for the R Environment.”Journal of Statistical Software 48 (6): 1–29. https://doi.org/10.18637/jss.v048.i06.
Choi, Seung W., Laura E. Gibbons, and Paul K. Crane. 2011. “Lordif: AnRPackage for DetectingDifferentialItemFunctioningUsingIterativeHybridOrdinalLogisticRegression/ItemResponseTheory and MonteCarloSimulations.”Journal of Statistical Software 39 (1): 1–30. https://doi.org/10.18637/jss.v039.i08.
Choi, Seung W., with contributions from Laura E. Gibbons, and Paul K. Crane. 2016. lordif: Logistic Ordinal Regression Differential Item Functioning Using IRT. https://CRAN.R-project.org/package=lordif.
Debelak, Rudolf, and Ingrid Koller. 2019. “Testing the Local Independence Assumption of the Rasch Model with Q3-Based Nonparametric Model Tests.”Applied Psychological Measurement 44. https://doi.org/10.1177/0146621619835501.
Echeverría, Guadalupe, Manuel Torres-Sahli, Nuria Pedrals, Oslando Padilla, Attilio Rigotti, and Marcela Bitran. 2017. “Validation of a Spanish Version of the Mental Health Continuum-Short Form Questionnaire.”Psicothema 29 (February): 96–102. https://doi.org/10.7334/psicothema2016.3.
Koller, Ingrid, Marco Maier, and Reinhold Hatzinger. 2015. “An Empirical Power Analysis of Quasi-Exact Tests for the Rasch Model: Measurement Invariance in Small Samples.”Methodology 11. https://doi.org/10.1027/1614-2241/a000090.
Komboz, Basil, Achim Zeileis, and Carolin Strobl. 2018. “Tree-Based Global Model Tests for Polytomous Rasch Models.”Educational and Psychological Measurement 78 (1): 128–66. https://doi.org/10.1177/0013164416664394.
Mair, Patrick, and Reinhold Hatzinger. 2007a. “CML Based Estimation of Extended Rasch Models with the eRm Package in r.”Psychology Science 49. https://doi.org/10.18637/jss.v020.i09.
———. 2007b. “Extended Rasch Modeling: The eRm Package for the Application of IRT Models in r.”Journal of Statistical Software 20. https://doi.org/10.18637/jss.v020.i09.
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Rusch, Thomas, Marco Maier, and Reinhold Hatzinger. 2013. “Linear Logistic Models with Relaxed Assumptions in r.” In Algorithms from and for Nature and Life, edited by Berthold Lausen, Dirk van den Poel, and Alfred Ultsch. Studies in Classification, Data Analysis, and Knowledge Organization. New York: Springer. https://doi.org/10.1007/978-3-319-00035-0_34.
Strobl, Carolin, Julia Kopf, and Achim Zeileis. 2015. “Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model.”Psychometrika 80 (2): 289–316. https://doi.org/10.1007/s11336-013-9388-3.
Strobl, Carolin, Florian Wickelmaier, and Achim Zeileis. 2011. “Accounting for Individual Differences in Bradley-Terry Models by Means of Recursive Partitioning.”Journal of Educational and Behavioral Statistics 36 (2): 135–53. https://doi.org/10.3102/1076998609359791.
Tendeiro, Jorge N., Rob R. Meijer, and A. Susan M. Niessen. 2016. “PerFit: An R Package for Person-Fit Analysis in IRT.”Journal of Statistical Software 74 (5): 1–27. https://doi.org/10.18637/jss.v074.i05.
Trepte, Sabine, and Markus Verbeet, eds. 2010. Allgemeinbildung in Deutschland – Erkenntnisse Aus Dem SPIEGELStudentenpisa-Test. Wiesbaden: VS Verlag.
Wickelmaier, Florian, and Achim Zeileis. 2018. “Using Recursive Partitioning to Account for Parameter Heterogeneity in Multinomial Processing Tree Models.”Behavior Research Methods 50 (3): 1217–33. https://doi.org/10.3758/s13428-017-0937-z.
Wickham, Hadley. 2007. “Reshaping Data with the Reshape Package.”Journal of Statistical Software 21 (12). https://www.jstatsoft.org/v21/i12/.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.”Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
William Revelle. 2024. psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois: Northwestern University. https://CRAN.R-project.org/package=psych.
Xie, Yihui. 2014. “knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2023. knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
---title: "Rasch analysis"subtitle: "Template file"title-block-banner: "#009ca6"title-block-banner-color: "#FFFFFF"author: name: Magnus Johansson affiliation: RISE Research Institutes of Sweden affiliation-url: https://www.ri.se/en/kbm orcid: 0000-0003-1669-592Xdate: last-modifieddate-format: isoalways_allow_html: trueformat: html: toc: true toc-depth: 3 toc-title: "Table of contents" embed-resources: true standalone: true page-layout: full mainfont: 'Lato' monofont: 'Roboto Mono' code-overflow: wrap code-fold: true code-tools: true code-link: true number-sections: true fig-dpi: 96 layout-align: left linestretch: 1.6 theme: - materia - custom.scss css: styles.css license: CC BY pdf: papersize: a4 documentclass: report execute: echo: true warning: false message: false cache: trueeditor_options: markdown: wrap: 72 chunk_output_type: consolebibliography: - references.bib - grateful-refs.bib---```{r}#| label: setup# one package below requires that you use devtools to install them manually:# first install devtools by# install.packages('devtools')library(RISEkbmRasch) # devtools::install_github("pgmj/RISEkbmRasch")library(grateful)library(ggrepel)library(car)library(kableExtra)library(readxl)library(tidyverse)library(eRm)library(mirt)library(psych)library(psychotree)library(matrixStats)library(reshape)library(knitr)library(patchwork)library(formattable) library(glue)### optional libraries#library(TAM)#library(skimr)#library(janitor)### some commands exist in multiple packages, here we define preferred ones that are frequently usedselect <- dplyr::selectcount <- dplyr::countrecode <- car::recoderename <- dplyr::rename```Our data comes from an open dataset [@echeverría] connected to a paper validating the Spanish version of the Mental Health Continuum-Short Form Questionnaire [MHC-SF; @echeverría2017].```{r}### import data - this is just sample code, the files do not existlibrary(readxl) # for reading excel filesdf.all <-read_excel("data/data_sMHCSF_Echeverria2017.xlsx")df <- df.all### create dif variablesdif.sex <-factor(df$Sex)df$Sex <-NULL# check levels with #levels(dif.sex)### label gender variable as factor# dif.gender <- factor(dif.gender,# levels = c(1,2,3),# labels = c("Female", "Male", "Other/missing response"))### Load item information# make sure that variable names in df match with itemlabels$itemnritemlabels <-read_excel("data/itemlabels_MHC_SF.xlsx") %>%mutate(item =str_squish(item))# match df variable names to itemlabels$itemnr variablenames(df) <- itemlabels$itemnr# recode response categories to numbersdf <- df %>%mutate(across(everything(), ~ car::recode(.x,"'Never'=0;'1 or 2 times a month'=1;'About 1 time a week'=2;'About 2 or 3 times a week'=3;'Almost daily'=4;'Daily'=5", as.factor =FALSE)))``````{r}##### Optionally: filter participants based on missing data##### Before filtering out participants, you should check the missing data structure using RImissing() and RImissingP()# If you want to include participants with missing data, input the minimum number of items responses that a participant should have to be included in the analysis:# min.responses <- 3# # # Select the variables we will work with, and filter out respondents with a lot of missing data# df <- df %>% # select(starts_with("item"),Sex,Age) %>% # variables that start with "item", and DIF-variables Sex and Age# filter(length(itemlabels$itemnr)-rowSums(is.na(.[itemlabels$itemnr])) >= min.responses) # include only respondents with data for at least 3 items# # #---- OR just filter out all respondents with any missing data----# df <- df %>% # select(starts_with("WAAQ"),Sex,Age) %>% # #mutate(across(where(is.character), ~ as.numeric(.x))) %>% # if data is input as characters, we need to convert to numeric# na.omit()``````{r}# optionally, load RISE ggplot theme and color palettes and set the theme as default.# just comment out the row below if you desire different themingsource("RISE_theme.R")```## All items in the analysis```{r}RIlistitems(df)```## Demographics```{r}#| layout-ncol: 2RIdemographics(dif.sex, "Gender")```### Descriptives of raw dataResponse distribution for all items are summarized below.```{r}#| tbl-cap: "Total number of responses for all items"RIallresp(df)```### Descriptives - item level```{r}#| column: marginRIlistItemsMargin(df, fontsize =12)```::: panel-tabset#### Tile plot```{r}RItileplot(df)```#### Stacked bars```{r}RIbarstack(df)```#### Barplots```{r}#| layout-ncol: 2RIbarplot(df)```#### Missing responses```{r}RImissing(df)```:::## Rasch analysis 1The eRm package, which uses Conditional Maximum Likelihood (CML)estimation, will be used primarily. For this analysis, the PartialCredit Model will be used.```{r}#| column: margin#| echo: falseRIlistItemsMargin(df, fontsize =13)```::: panel-tabset### Item fit```{r}RIitemfitPCM2(df,samplesize =250,nsamples =4)```### PCA```{r}#| tbl-cap: "PCA of Rasch model residuals"RIpcmPCA(na.omit(df))```### Residual correlations```{r}RIresidcorr(df, cutoff =0.2)```### 1st contrast loadings```{r}RIloadLoc(df)```### Many items ICC```{r}mirt(df, model=1, itemtype='Rasch', verbose =FALSE) %>%plot(type="trace", as.table =TRUE, theta_lim =c(-6,6))```### Fewer items ICC```{r}RIitemCats(df)```### Targeting```{r}#| fig-height: 8# increase fig-height above as needed, if you have many itemsRItargeting(df,bins =50)```### Item hierarchy```{r}#| fig-height: 8RIitemHierarchy(df)```:::mhc5 is high in item fit, and also deviant in loadings on 1st residual contrast.PCA eigenvalue is above 2, there clearly is multidimensionality.Two clusters in residual correlations:- items 1-3- items 6-8Response category thresholds are disordered for many items.- 'Never'=0;- '1 or 2 times a month'=1;- 'About 1 time a week'=2;- 'About 2 or 3 times a week'=3;- 'Almost daily'=4;- 'Daily'=5```{r}df %>%mutate(across(everything(), ~recode(.x, "3=2;4=3;5=4"))) %>%RItileplot()df.backup <- dfdf <- df %>%mutate(across(everything(), ~recode(.x, "3=2;4=3;5=4")))```### ICC check after recode```{r}mirt(df, model=1, itemtype='Rasch', verbose =FALSE) %>%plot(type="trace", as.table =TRUE, theta_lim =c(-6,6))``````{r}removed.items <-c("mhc5")#df$mhc5 <- NULLdf <- df %>%select(!any_of(removed.items))```## Rasch 2We removed item mhc5 due to high item fit and deviant 1st contrast loadings.Response category 2 and 3 were merged, this is now "1-3 times a week".```{r}#| column: margin#| echo: falseRIlistItemsMargin(df, fontsize =13)```::: panel-tabset### Item fit```{r}RIitemfitPCM2(df,samplesize =250,nsamples =4)```### PCA```{r}#| tbl-cap: "PCA of Rasch model residuals"RIpcmPCA(na.omit(df))```### Residual correlations```{r}RIresidcorr(df, cutoff =0.2)```### 1st contrast loadings```{r}RIloadLoc(df)```### Many items ICC```{r}mirt(df, model=1, itemtype='Rasch', verbose =FALSE) %>%plot(type="trace", as.table =TRUE, theta_lim =c(-6,6))```### Fewer items ICC```{r}RIitemCats(df)```### Targeting```{r}#| fig-height: 8# increase fig-height above as needed, if you have many itemsRItargeting(df,bins =50)```### Item hierarchy```{r}#| fig-height: 8RIitemHierarchy(df)```:::We still have two clusters of residual correlations.- Items 1-3- 6-8It looks like item 1 has the best targeting and item fit, we remove 2 and 3.Item 7 has better distance between response categories, we remove 6 and 8.```{r}removed.items <-c("mhc5","mhc2","mhc3","mhc6","mhc8")#df$mhc5 <- NULLdf.backup2 <- dfdf <- df %>%select(!any_of(removed.items))```## Rasch 3```{r}#| column: margin#| echo: falseRIlistItemsMargin(df, fontsize =13)```::: panel-tabset### Item fit```{r}RIitemfitPCM2(df,samplesize =250,nsamples =4)```### PCA```{r}#| tbl-cap: "PCA of Rasch model residuals"RIpcmPCA(na.omit(df))```### Residual correlations```{r}RIresidcorr(df, cutoff =0.2)```### 1st contrast loadings```{r}RIloadLoc(df)```### Many items ICC```{r}mirt(df, model=1, itemtype='Rasch', verbose =FALSE) %>%plot(type="trace", as.table =TRUE, theta_lim =c(-6,6))```### Fewer items ICC```{r}RIitemCats(df)```### Targeting```{r}#| fig-height: 8# increase fig-height above as needed, if you have many itemsRItargeting(df,bins =50)```### Item hierarchy```{r}#| fig-height: 8RIitemHierarchy(df)```:::Item 7 has high item fit and deviant loadings on the 1st residual contrast. We will remove it.No residual correlations remain.```{r}removed.items <-c("mhc5","mhc2","mhc3","mhc6","mhc8","mhc7")#df$mhc5 <- NULLdf.backup3 <- dfdf <- df %>%select(!any_of(removed.items))```## Rasch 4```{r}#| column: margin#| echo: falseRIlistItemsMargin(df, fontsize =13)```::: panel-tabset### Item fit```{r}RIitemfitPCM2(df,samplesize =250,nsamples =4)```### PCA```{r}#| tbl-cap: "PCA of Rasch model residuals"RIpcmPCA(na.omit(df))```### Residual correlations```{r}RIresidcorr(df, cutoff =0.2)```### 1st contrast loadings```{r}RIloadLoc(df)```### Many items ICC```{r}mirt(df, model=1, itemtype='Rasch', verbose =FALSE) %>%plot(type="trace", as.table =TRUE, theta_lim =c(-6,6))```### Fewer items ICC```{r}RIitemCats(df)```### Targeting```{r}#| fig-height: 8# increase fig-height above as needed, if you have many itemsRItargeting(df,bins =50)```### Item hierarchy```{r}#| fig-height: 8RIitemHierarchy(df)```:::## LRT-based DIF```{r}erm.out <-PCM(df)LRtest(erm.out, splitcr = dif.sex) ```Review the documentation for further details, using `?LRtest` in your R console panel in Rstudio. There is also a plotting function, `plotGOF()` that may be of interest.```{r}#| column: margin#| echo: falseRIlistItemsMargin(df, fontsize =13)```::: panel-tabset#### Item location table```{r}RIdifTableLR(df, dif.sex)```#### Item location figure```{r}#| fig-height: 7RIdifFigureLR(df, dif.sex)```#### Item threshold table```{r}RIdifThreshTblLR(df, dif.sex)```#### Item threshold figure```{r}#| fig-height: 7RIdifThreshFigLR(df, dif.sex)```:::## DIF-analysis 2### Gender```{r}#| column: margin#| echo: falseRIlistItemsMargin(df, fontsize =13)```::: panel-tabset#### Table```{r}RIdifTable(df, dif.sex)```#### Locations```{r}RIdifFigure(df, dif.sex)```#### Thresholds```{r}RIdifFigThresh(df, dif.sex)```:::### Person location & infit ZSTD```{r}RIpfit(df)```## Test Information (Reliability)```{r}RItif(df)```## lordif```{r}#| results: hidelibrary(lordif)g_dif <-lordif(as.data.frame(df), as.numeric(dif.sex), # make sure that the data is in a dataframe-object and that the DIF variable is numericcriterion =c("Chisqr"), alpha =0.01, beta.change =0.1,model ="GPCM",R2.change =0.02)g_dif_sum <-summary(g_dif)``````{r}# threshold values for colorizing the table belowalpha =0.01beta.change =0.1R2.change =0.02g_dif_sum$stats %>%as.data.frame() %>%select(!all_of(c("item","df12","df13","df23"))) %>%round(3) %>%add_column(itemnr =names(df), .before ="ncat") %>%mutate(across(c(chi12,chi13,chi23), ~cell_spec(.x,color =case_when( .x < alpha ~"red",TRUE~"black" )))) %>%mutate(across(starts_with("pseudo"), ~cell_spec(.x,color =case_when( .x > R2.change ~"red",TRUE~"black" )))) %>%mutate(beta12 =cell_spec(beta12,color =case_when( beta12 > beta.change ~"red",TRUE~"black" ))) %>%kbl_rise()```We can review the results regarding uniform/non-uniform DIF by looking at the `chi*` columns. Uniform DIF is indicated by column `chi12` and non-uniform DIF by `chi23`, while column `chi13` represents "an overall test of "total DIF effect" [@choi_lordif_2011].## Item parameters```{r}RIitemparams(df)```## Transformation table```{r}RIscoreSE(df,score_range =c(-4,5))```## Ordinal/interval figure```{r}RIscoreSE(df, output ="figure", score_range =c(-4,5))```## Software used```{r}pkgs <-cite_packages(cite.tidyverse =TRUE, output ="table",bib.file ="grateful-refs.bib",include.RStudio =TRUE,out.dir =getwd())formattable(pkgs, table.attr ='class=\"table table-striped\" style="font-size: 15px; font-family: Lato; width: 80%"')```## References